Addressing Environment Non-Stationarity by Repeating Q-learning Updates

نویسندگان

Sherief Abdallah

Michael Kaisers

چکیده

Q-learning (QL) is a popular reinforcement learning algorithm that is guaranteed to converge to optimal policies in Markov decision processes. However, QL exhibits an artifact: in expectation, the effective rate of updating the value of an action depends on the probability of choosing that action. In other words, there is a tight coupling between the learning dynamics and underlying execution policy. This coupling can cause performance degradation in noisy non-stationary environments. Here, we introduce Repeated Update Q-learning (RUQL), a learning algorithm that resolves the undesirable artifact of Q-learning while maintaining simplicity. We analyze the similarities and differences between RUQL, QL, and the closest state-of-the-art algorithms theoretically. Our analysis shows that RUQL maintains the convergence guarantee of QL in stationary environments, while relaxing the coupling between the execution policy and the learning dynamics. Experimental results confirm the theoretical insights and show how RUQL outperforms both QL and the closest state-of-the-art algorithms in noisy non-stationary environments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Addressing the policy-bias of q-learning by repeating updates

Q-learning is a very popular reinforcement learning algorithm being proven to converge to optimal policies in Markov decision processes. However, Q-learning shows artifacts in non-stationary environments, e.g., the probability of playing the optimal action may decrease if Q-values deviate significantly from the true values, a situation that may arise in the initial phase as well as after change...

متن کامل

Adaptive Multiagent Q-Learning with Initial Heuristic Approximation

The problem of effective coordination learning of multiple autonomous agents in a multiagent system (MAS) is one of the most complex challenges in artificial intelligence because of two principal cumbers: non-stationarity of the environment and exponential growth of its dimensionality with number of agents. Non-stationarity of the environment is due to the dependence of the transition function ...

متن کامل

Multiagent Learning

One of the greatest difficulties about multiagent learning is that the environment is not stationary with respect to the agent. In case of single agent learning problems, the agent has to maximize its expected reward with respect to an environment which is stationary. In case of multiagent scenarios, all the agents learning simultaneously poses a problem of non-stationarity in the environment w...

متن کامل

Addressing Function Approximation Error in Actor-Critic Methods

In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and critic. Our algorithm takes the minimum value between a pair of critics to restrict...

متن کامل

Incremental Sensorimotor Learning with Constant Update Complexity

The field of robotics is increasingly moving toward applications that involve unstructured human environments. This domain is challenging from a learning perspective, since subsequent observations are dependent and the environment is typically non-stationary. This non-stationarity is not limited to the external environment, as internal sensorimotor relationships may be subject to change as well...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Journal of Machine Learning Research

دوره 17 شماره

صفحات -

تاریخ انتشار 2016

Addressing Environment Non-Stationarity by Repeating Q-learning Updates

نویسندگان

چکیده

منابع مشابه

Addressing the policy-bias of q-learning by repeating updates

Adaptive Multiagent Q-Learning with Initial Heuristic Approximation

Multiagent Learning

Addressing Function Approximation Error in Actor-Critic Methods

Incremental Sensorimotor Learning with Constant Update Complexity

عنوان ژورنال:

اشتراک گذاری